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Question 1 [24+24+2424242=12 marks total| 


The following represents a sorted list of values for the age attribute from a set of 
data tuples: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 
35, 35, 36, 40, 45, 46, 52, 70. 


(a) What is the mean of the data? 

(b) What is the median of the data? 

(c) What is the mode? Comment on the modality. 

(d) What are (roughly) the 1st (Q1) and 3rd (Q3) quartiles? 
(ec) What is the Interquartile Range (IQR)? 
(f) What is the five-number summary? 


Question 2 [24+44+242=10 marks total] 


The raw data set Accounts (Student , Mode, Faculty, Logins, Time ,Downloads) is: 


Student Mode Faculty Logins Time Downloads 


albert ext Science 12 120 2.7 
bazza int Science 6 67 1.1 
cathy ext Arts 16 320 22:9 
dave ext Arts 20 250 ee 
daffy int Science 15 85 1.6 
fredo ext Arts 10 50 0.9 


where Time is connection time in minutes and Downloads are in megabytes. 


(a) List all cuboids in a data cube with dimensions Mode and Faculty. 
(b) Construct the base cuboid with aggregates SUM(Logins), and MAX(Time). 
(c) Construct the Faculty cuboid for the aggregate SUM(Logins). 

) 


(d) Construct the apex cuboid for the aggregate SUM(Logins). 
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Question 3 [2+24+24+4=10 marks total] 


(a) List 4 OLAP operations. 
(b) Give one example of a concept hierarchy. 


(c) Name a method for training a multilayer neural network. What gets updated 
during training? 


(d) Compute the Euclidean distance between the points (22, 1, 42, 10) and 
(20, 0, 36, 8). 
Question 4 [2+8+8+2=20 marks total] 


Build a Naive Bayes classification model from the following table of six records: 


a bc d_ category 
RI|N Y N N 2 
RZ Se ye oe 1 
R3}Y N N N 1 
R4|N N N N 2 
R5|}Y N N Y 2 
R6|N Y Y N 2 


(a) Calculate P(c), the probability of finding category c, for all categories. 


(b) Calculate P(i|c), the probability of finding attribute 7 given category c, for all 
attributes and categories. 


(c) Let U=(Y,N,N,Y) be an unclassified record. Calculate P(U|c), the probability 
of finding the attribute set U given category c, for all categories. 


(d) To what category does the Naive Bayes classifier assign the attribute set U? 


Question 5 [44+4=8 marks total] 


Consider the following confusion matrix for a classification model with a percent 
error of 13.8% for Class 2, and a model accuracy of 88%. 


Predicted 
Class 1 Class 2 Class 3 
Actual Class 1 | 30 0 3 
Class 2 | x 25 2 
Class 3 | 1 y 41 
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(a) Find the value of x (round to nearest integer). 


(b) Find the value of y (round to nearest integer). 


Question 6 (hard) [10 marks total] 


Find the best attribute for the root node of a decision tree based on the following 
table of fish identification records: 


length width colour’ category 
R1 | long average red 
R2 | long thin blue 
R3 | short wide blue 
R4 | long wide blue 
R5 | medium average green 
R6 | short thin green 
R7 | short average green 
R8 | long wide red 


DWWrrnne 


You will need to use the entropy impurity discussed in lectures. The entropy im- 
purity of node N is: i(N) = —)°>, P(c) log, P(c); where P(c) is the fraction of 
attributes at node N that are in category c. You will also need the impurity drop at 
node N, defined for the present case as: Ai(N) = i(.N)—Pyi(.Ni) — Poi( N2) — P3i(.N3); 
where N,, are the child nodes of N; i(.NV,,) are their impurities; and P,, is the frac- 
tion of attributes at node N that will go to node N,,. The values for i(V) are 
i(length) = 1.41, i(width) = i(colour) ® 1.56. You may also need: log,1 = 0, 
log, 0.5 = —1, log, 1/3 & —1.59, log, 2/3 » —0.58. Hint: To make the task clearer 
draw a diagram first. 
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